Skip to content

fix: skip false-positive fork detection after pickle in BQ analytics plugin#5528

Closed
caohy1988 wants to merge 2 commits intogoogle:mainfrom
caohy1988:fix/bqaa-false-fork-detection
Closed

fix: skip false-positive fork detection after pickle in BQ analytics plugin#5528
caohy1988 wants to merge 2 commits intogoogle:mainfrom
caohy1988:fix/bqaa-false-fork-detection

Conversation

@caohy1988
Copy link
Copy Markdown

Summary

Fixes false-positive fork detection in the BigQuery Agent Analytics Plugin when deployed via Vertex AI Agent Engine (pickle/unpickle lifecycle).

Related: haiyuan-eng-google/BigQuery-Agent-Analytics-SDK#86 (Comcast production latency + fork warnings)

Problem

When the plugin is deployed via Agent Engine:

  1. Plugin created locally → _init_pid = os.getpid()
  2. Plugin pickled for deployment → __getstate__ sets _init_pid = 0
  3. Plugin unpickled on server → _init_pid = 0
  4. First request → _ensure_started() checks os.getpid() != self._init_pid
  5. Since os.getpid() is never 0, this always evaluates to True
  6. _reset_runtime_state() fires — tears down gRPC state, logs misleading "Fork detected (parent PID 0, child PID xx)" warning

No fork actually happened. The plugin was just unpickled. _started = False already ensures _lazy_setup() will run. The unnecessary _reset_runtime_state() adds cold-start latency and produces misleading log noise.

Fix

One-line change in _ensure_started():

# Before:
if os.getpid() != self._init_pid:

# After:
if self._init_pid != 0 and os.getpid() != self._init_pid:

This distinguishes:

  • _init_pid == 0 → unpickled, never initialized in this process → skip reset
  • _init_pid != 0 and _init_pid != os.getpid() → real fork from a different process → reset

Real forks are still caught by both:

  • os.register_at_fork(after_in_child=_after_fork_in_child) (line 108)
  • This PID check (for environments without register_at_fork)

Test plan

  • 215 tests pass, 0 regressions
  • test_no_reset_after_unpickle — unpickled plugin (_init_pid=0) does NOT trigger _reset_runtime_state
  • test_reset_on_real_fork — plugin with a real non-zero _init_pid from a different process DOES trigger _reset_runtime_state

🤖 Generated with Claude Code

…plugin

When the plugin is deployed via Vertex AI Agent Engine, it is pickled
for transport and unpickled on the server.  __getstate__ sets
_init_pid = 0 as a pickle sentinel.  On the server, _ensure_started()
checks os.getpid() != self._init_pid, which always evaluates to True
since os.getpid() is never 0.  This triggers _reset_runtime_state()
on every cold start even though no fork happened, producing a
misleading "Fork detected (parent PID 0, child PID xx)" warning and
adding unnecessary startup latency from tearing down and re-creating
gRPC state that was already clear.

The fix distinguishes "unpickled, never initialized" (_init_pid == 0)
from "forked from a different process" (_init_pid != 0 and
_init_pid != os.getpid()).  Real forks are still detected by both
os.register_at_fork (line 108) and this PID check.

Related: haiyuan-eng-google/BigQuery-Agent-Analytics-SDK#86

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@google-cla
Copy link
Copy Markdown

google-cla Bot commented Apr 28, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@adk-bot adk-bot added the services [Component] This issue is related to runtime services, e.g. sessions, memory, artifacts, etc label Apr 28, 2026
@adk-bot
Copy link
Copy Markdown
Collaborator

adk-bot commented Apr 28, 2026

Response from ADK Triaging Agent

Hello @caohy1988, thank you for creating this PR!

This PR is a bug fix. Per our contribution guidelines, could you please create a GitHub issue in this repository and associate it with this PR?

This will help us track the bug and the fix more effectively. Thanks!

After _lazy_setup succeeds, set _init_pid = os.getpid() when it was
the pickle sentinel (0).  Without this, an unpickled plugin keeps
_init_pid == 0 forever, disabling the PID-based fork check for the
rest of the instance's lifetime.

Also fix test_reset_on_real_fork to use max(os.getpid() - 1, 1)
instead of hardcoded 99999.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sasha-gitg
Copy link
Copy Markdown
Collaborator

Please do not open PR drafts

@sasha-gitg sasha-gitg closed this May 4, 2026
caohy1988 added a commit to caohy1988/adk-python that referenced this pull request May 4, 2026
Two fixes to the BigQuery Agent Analytics Plugin:

1. **False-positive fork detection after pickle (google#5528):** When
   deployed via Vertex AI Agent Engine, __getstate__ sets
   _init_pid = 0.  On the server, _ensure_started() checked
   os.getpid() != 0 which always triggered _reset_runtime_state(),
   producing misleading "Fork detected" warnings and adding cold-start
   latency.  Fix: skip reset when _init_pid == 0 (pickle sentinel),
   and record os.getpid() after successful startup so fork detection
   works for the rest of the instance lifetime.

2. **GCS text offload byte/character unit mismatch (google#5561):** The
   offload decision mixed inline_text_limit (32KB, byte-based) and
   max_content_length (character-based) in a single min() comparison.
   For multi-byte text (CJK, emoji), this produced false offloads.
   Fix: evaluate each limit in its own unit — byte_len vs
   inline_text_limit, char_len vs max_length — offload if either
   is exceeded.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
caohy1988 added a commit to caohy1988/adk-python that referenced this pull request May 4, 2026
…ponse logging

Three fixes to the BigQuery Agent Analytics Plugin:

1. **False-positive fork detection after pickle (google#5528):** When
   deployed via Vertex AI Agent Engine, __getstate__ sets
   _init_pid = 0.  On the server, _ensure_started() checked
   os.getpid() != 0 which always triggered _reset_runtime_state(),
   producing misleading "Fork detected" warnings and adding cold-start
   latency.  Fix: skip reset when _init_pid == 0 (pickle sentinel),
   and record os.getpid() after successful startup so fork detection
   works for the rest of the instance lifetime.

2. **GCS text offload byte/character unit mismatch (google#5561):** The
   offload decision mixed inline_text_limit (32KB, byte-based) and
   max_content_length (character-based) in a single min() comparison.
   For multi-byte text (CJK, emoji), this produced false offloads.
   Fix: evaluate each limit in its own unit — byte_len vs
   inline_text_limit, char_len vs max_length — offload if either
   is exceeded.

3. **Missing final agent response in BQ (issue google#87):** The plugin
   logged LLM_RESPONSE (pre-callback raw output) and AGENT_COMPLETED
   (latency only, no content), but never captured the actual final
   response delivered to the user after callback modifications.
   Fix: detect final response events in on_event_callback via a
   strict guard (is_final_response + no function calls/responses +
   no long-running tool IDs) and log as AGENT_RESPONSE with
   source_event_author attribution from event.author.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
caohy1988 added a commit to caohy1988/adk-python that referenced this pull request May 5, 2026
…ponse logging

Three fixes to the BigQuery Agent Analytics Plugin:

1. **False-positive fork detection after pickle (google#5528):** When
   deployed via Vertex AI Agent Engine, __getstate__ sets
   _init_pid = 0.  On the server, _ensure_started() checked
   os.getpid() != 0 which always triggered _reset_runtime_state(),
   producing misleading "Fork detected" warnings and adding cold-start
   latency.  Fix: skip reset when _init_pid == 0 (pickle sentinel),
   and record os.getpid() after successful startup so fork detection
   works for the rest of the instance lifetime.

2. **GCS text offload byte/character unit mismatch (google#5561):** The
   offload decision mixed inline_text_limit (32KB, byte-based) and
   max_content_length (character-based) in a single min() comparison.
   For multi-byte text (CJK, emoji), this produced false offloads.
   Fix: evaluate each limit in its own unit — byte_len vs
   inline_text_limit, char_len vs max_length — offload if either
   is exceeded.

3. **Missing final agent response in BQ (issue google#87):** The plugin
   logged LLM_RESPONSE (pre-callback raw output) and AGENT_COMPLETED
   (latency only, no content), but never captured the final response
   events emitted by agents after callback modifications.  Fix: detect
   final response events in on_event_callback via a strict guard and
   log as AGENT_RESPONSE with source_event_author from event.author.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
caohy1988 added a commit to caohy1988/adk-python that referenced this pull request May 5, 2026
…ponse logging

Three fixes to the BigQuery Agent Analytics Plugin:

1. **False-positive fork detection after pickle (google#5528):** When
   deployed via Vertex AI Agent Engine, __getstate__ sets
   _init_pid = 0.  On the server, _ensure_started() checked
   os.getpid() != 0 which always triggered _reset_runtime_state(),
   producing misleading "Fork detected" warnings and adding cold-start
   latency.  Fix: skip reset when _init_pid == 0 (pickle sentinel),
   and record os.getpid() after successful startup so fork detection
   works for the rest of the instance lifetime.

2. **GCS text offload byte/character unit mismatch (google#5561):** The
   offload decision mixed inline_text_limit (32KB, byte-based) and
   max_content_length (character-based) in a single min() comparison.
   For multi-byte text (CJK, emoji), this produced false offloads.
   Fix: evaluate each limit in its own unit — byte_len vs
   inline_text_limit, char_len vs max_length — offload if either
   is exceeded.

3. **Missing final agent response in BQ (issue google#87):** The plugin
   logged LLM_RESPONSE (pre-callback raw output) and AGENT_COMPLETED
   (latency only, no content), but never captured the final response
   events emitted by agents after callback modifications.  Fix: detect
   final response events in on_event_callback via a strict guard and
   log as AGENT_RESPONSE with source_event_author from event.author.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
caohy1988 added a commit to caohy1988/adk-python that referenced this pull request May 5, 2026
…ponse logging

Three fixes to the BigQuery Agent Analytics Plugin:

1. **False-positive fork detection after pickle (google#5528):** When
   deployed via Vertex AI Agent Engine, __getstate__ sets
   _init_pid = 0.  On the server, _ensure_started() checked
   os.getpid() != 0 which always triggered _reset_runtime_state(),
   producing misleading "Fork detected" warnings and adding cold-start
   latency.  Fix: skip reset when _init_pid == 0 (pickle sentinel),
   and record os.getpid() after successful startup so fork detection
   works for the rest of the instance lifetime.

2. **GCS text offload byte/character unit mismatch (google#5561):** The
   offload decision mixed inline_text_limit (32KB, byte-based) and
   max_content_length (character-based) in a single min() comparison.
   For multi-byte text (CJK, emoji), this produced false offloads.
   Fix: evaluate each limit in its own unit — byte_len vs
   inline_text_limit, char_len vs max_length — offload if either
   is exceeded.

3. **Missing final agent response in BQ (issue google#87):** The plugin
   logged LLM_RESPONSE (pre-callback raw output) and AGENT_COMPLETED
   (latency only, no content), but never captured the final response
   events emitted by agents after callback modifications.  Fix: detect
   final response events in on_event_callback via a strict guard and
   log as AGENT_RESPONSE with source_event_author from event.author.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

services [Component] This issue is related to runtime services, e.g. sessions, memory, artifacts, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants